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ABSTRACT 

Good statistics for measuring large-scale structure in the Universe must be able 
to distinguish between different models of structure formation. In this paper, two 
and three dimensional "counts in cell" statistics and a new "discrete genus statis- 
tic" are applied to toy versions of several popular theories of structure formation: 
random phase cold dark matter model, cosmic string models, and global texture 
scenario. All three statistics appear quite promising in terms of differentiating 
between the models. 
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1. Introduction 

Recent redshift surveys extending to greater than lOOh"-^ Mpc seem, upon 
visual inspection, to be dominated by voids, sheets and filaments^'^). Are these 
structures real and are they significant? Are they consistent with currently popular 
models of structure formation? In order to answer these questions, we need statis- 
tical methods. Since most of the currently popular theories of structure formation 
predict a similar (namely scale invariant) spectrum of density perturbations^-*, we 
require statistical measures which can pick out the phase information which dis- 
tinguishes between the models. 

More specifically, we are interested in statistics which can clearly distinguish 
between the two most popular classes of theories: models based on random phase 
fiuctuations produced during infiation {e.g., the cold dark matter (CDM) model) 
on one hand and topological defect models on the other. A good statistic must 
also be able to differentiate between the various topological defect models - as 
representative examples we pick the global texture scenario, a model based on 
cosmic string wakes, and a filament model. 

In this paper we discuss three promising statistics: two and three dimensional 
counts in cell (CIC) statistics^^ and a discrete genus statistic. We apply these 
statistics to our toy models of structure formation and conclude that for sufficiently 
small observational error bars the statistics will be able to clearly differentiate 
between the models. 

In Section 2 we define our toy models. In Section 3 we define the "discrete 
genus statistic" and apply it to our toy models. In Section 4, we investigate the 
two dimensional and three dimensional CIC statistics. We compare the results 
of the two dimensional CIC statistic with the results for two shces of the CFA2 
survey-^) . The final section contains a discussion of the results and ideas for future 
work. 

This paper is based on senior theses by D.K.^) and S.R.^). 
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2. Toy Models 

The purpose of this paper is primarily to study the effectiveness of the statistics 
considered here at distinguishing different models of structure formation. At this 
stage we are not yet attempting to confirm or rule out concrete models. Hence, 
we will apply the statistics to toy models of structure formation. These models 
are designed to mimic key features of specific theories of structure formation, in 
particular the distinctive non-gaussian and topological aspects. 

We consider five models: a model in which galaxies are randomly distributed 
throughout the sample volume (Poisson model), a CDM model (without nonhn- 
earities taken into account), a cosmic string wake model, a cosmic string filament 
model, and a global texture model. 

In order to study the dependence of the statistical measures on topology (rather 
than number density), we chose all topological defect models to contain the same 
number of structures, one per Hubble volume at tgq, the time of equal matter and 
radiation. All structures in a given model have the same mass, with the total mass 
chosen to give a spatially fiat Universe. 

Taking the structures to have the same size corresponds to a severe truncation 
of the power spectrum of the actual topological defect models. The justification for 
this truncation comes from the fact that structures produced at teq are dominant 
in both the global texture models and in cosmic string models in which the dark 
matter is hot. We will come back to this point below. 

The numerical simulations produce cubes of data whose side length is 200 Mpc 
and which contains 222,400 galaxies, chosen such that the number of galaxies per 
unit volume agrees roughly with the number density of the CFA2 survey^) . 

In the texture toy model, spherical balls of galaxies with Gaussian radial density 
function were placed randomly in the sample volume. The standard deviation of 

the Gaussian was taken to be 9 Mpc. 

This toy model should provide a rough approximation for what happens in 
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the actual texture modeF'®) In this theory, density perturbations are caused by 
contracting topologically nontrivial scalar field configurations. There is a fixed 
probability p per Hubble volume that at any time t a nontrivial configuration will 
become smaller than the Hubble radius and start to contract at relativistic speeds, 
leading to a roughly spherical density perturbation. Hence, a fixed number p of 
textures per Hubble volume per expansion time are created. Those produced before 
teq are washed out by pressure, those produced after teq have less time to grow by 
gravitational instability. Hence, the most prominent texture induced perturbations 
are those laid down at teq- 

The cosmic string model of galaxy formation^) still has many uncertain as- 
pects. It is known that the network of cosmic strings approaches a scale invariant 
distribution^^) , i.e., the distribution of strings looks statistically the same at all 
times provided all lengths are scaled by the Hubble radius. The cosmic string en- 
semble consists of a network of infinite strings with curvature radius comparable 
to time t, and a distribution of loops with radius smaller than t. Recent numer- 
ical simulations^^'' agree that the loops are subdominant. However, there is no 
agreement on the small scale structure on long strings. 

If long strings are straight on small scales, they will form planar density pertur- 
bations called wakes-^^) (see e.g., Ref. 13 for a recent review of structure formation 
in the cosmic string model) of planar dimensions t x vt, where v is the velocity of 
the string in its normal plane. However, if there is small scale structure on the long 
strings, these strings will move slowly and will exert a local gravitational force on 
the surrounding matter, leading to the formation of filaments^^^ . Because of this 
uncertainty in the string model we consider two cosmic string toy models, a "wake 
model" and a "filament model." They correspond to the two extreme cosmic string 
scenarios. 

In the wake model, rectangular prisms of length and width 40 Mpc and thick- 
ness 2 Mpc were placed randomly in the sample volume (subject to the constraint 
that they he entirely in the sample volume). The thickness corresponds to the 
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thickness of the nonUnear region around the wake for a cosmic string model with 
hot dark matter and a mass per unit length fi given by Gfi = 10~^, G being 
Newton's constant^^-*. This value of fi is the preferred value based on large-scale 
structure analyses^^^ and on the COBE cosmic microwave anisotropy results^^^. 
Note that for h — 0.5, the planar dimensions of the wake correspond to the Hubble 
radius at teq- 

In the filament model, cylinders of length 60 Mpc and radius 4.1 Mpc were 
placed randomly in the sample volume. Galaxies were placed at random in the 
cylinders, as they were in the wake model. 

The hnear CDM model was constructed by starting from the power spectrum^^^ 
with 

/3 = 1.7(l]o/i^)"^Mpc 

u = 9(Qo/i^)"^-^Mpc^-^ , (2) 
7 = l(noh^)-^Mpc^ 

Fourier transforming to position space, and laying down galaxies according to the 
position space density distribution. The transition from Fourier space to position 
space was done by taking the lowest 50"^ Fourier modes (corresponding to the 
sample volume) in the first octant of Fourier space, choosing random phases for 
all of these modes, by evaluating the Fourier transform at 50^ cell centers x^jj^ 
in position space, by calculating the number of galaxies in cell (ijk) according to 
Pi^ijk) by laying down the galaxies at random in the cell (for details see Ref. 
6). 

3. Discrete Genus Statistic 

The first statistic we investigate is a variant (developed in Ref. 5) of the genus 
statistic which was proposed in 1986 by Gott et al.-^^^ as a method of gaining direct 
information about the topology of the galaxy distribution. 
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For a compact surface 5 in K^, the genus is defined as 



g — oi lioles) — (# of disconnected components) + 1 . (3) 

By the Gauss-Bonnet theorem the genus can be computed as a surface integral of 
the Gaussian curvature k: 



= -IJmA. (4) 



s 

Given a smooth density distribution p(a;), the genus statistic is defined as the 
curve g'(p), where g{p) is the genus of the surface p{x) — p. For a random phase 
density field, the genus curve can be calculated analytically^^) : 

^(i/) = 7V(l-i.2)exp(-^/V2), (5) 

where u is the number of standard deviations from the mean density, and iV is a 
constant which depends on the power spectrum. Note that g{v) is peaked at z/ = 
and is symmetric. For non-Gaussian models we expect a shift in the peak position 
and a deviation from symmetry about v — Q. 

The usual method-'^^) of applying the genus statistic to a distribution of galaxies 
is to construct a smooth density field by smearing each galaxy with a Gaussian 
distribution 

where r is the smoothing length. 

The choice of A is critical. A must be large enough such that the density 
distribution inside structures connect, but small enough such that the topology of 
the dominant structures is not lost. The results depend crucially on A, and this is 
a big disadvantage of the statistic. 
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To avoid the above problem, we use a "discrete genus statistic"^). Given a 
volume limited redshift survey (or a simulated galaxy distribution), we divide the 
volume into cells of size smaller than that of the structures we are interested in 
probing but large enough such that the counts in cell are not dominated by shot 
noise. In our simulations we chose a cell size of 8 Mpc. 

Consider the polygonal surface S{n) which is the boundary of the complex of 
cells each of which contains greater than or equal to n galaxies. The genus g{n) of 
this surface is 

g(n)^l-^(V-E + F) (7) 

where V, E and F are the number of vertices, edges and faces respectively. 

The curve g{n) is the discrete genus curve. The surface S{n) can be regarded 
as the surface with galaxy density n/(cell volume). Hence, we can plot g as a, 
function of the galaxy number density. 

The results of the numerical simulations are shown in Fig. 1. With exception 
of the CDM model, the data is the average of 20 independent simulations of the 
model. The statistical error bars are smaller than the symbol sizes. The CDM 
model results come from a single realization. 

The most important conclusion we can draw from this investigation is that the 
discrete genus statistic is a very powerful discriminant between different models 
of structure formation. The genus curves for all topological defect models are 
highly asymmetrical about the mean number density, whereas the Poisson and 
CDM models are symmetrical. The width of the genus curve for the CDM model 
is larger than that for the Poisson model which reflects the degree of clustering 
in the simulation. The difference in the peak density is due to a slightly different 
normalization of the models. 

For the wake the genus curve is positive. This is due to the many holes between 
the interconnected network of wakes. In contrast, the genus curve for the texture 
model is overwhelmingly negative since the distribution of galaxies is clumpy (no 
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holes and many disconnected components). The curve for the filament model lies 
between the two extreme cases. 

4. Counts in Cell Statistics 

The counts in cell statistics^) are very simple. The sample volume is divided 
into cells of equal volume. For each integer n, the number /(n) of occurrences of 
cells with n galaxies is determined. The graph of f{n) as a function of n is the 
counts in cell statistic (CIC). Counts in cell statistics have been studied extensively 
by Saslaw and collaborators^^-' , and more recently by Coles and Plionis^^-* for the 
Lick galaxy catalog, by Coles et al.^^^ for CDM models, by Kaiser et al.^^^ for IRAS 
galaxies, by Weinberg and Cole^^^ and by de Lapparent et al.^^^ in the context of 
defining a percolation statistic. 

For our three dimensional simulations, it is straightforward to evaluate the 
CIC. We divide the simulation box into 50"^ cells, each on the average containing 
about two galaxies. Simulations with a smaller number of cells showed more noise 
whereas the range of n values with f{n)^0 was too small for a greater number 
of cells. 

The results of the simulations are shown in Figs. 2 and 3. As in Section 3, the 
results arc averages over 20 simulations, except for the CDM model for which only 
a single realization was considered. In the region of n values plotted in Fig. 2, the 
one sigma statistical error bars are of the size of the symbols, as is seen from the 
individual plots of Fig. 3. 

The most obvious conclusion is that the three dimensional CIC statistic can 
well discriminate between our toy models. The CIC curve for the texture model 
has the longest tail, a reflection of the dense clusters of galaxies it contains. The 
length of the tail of the CIC curve decreases as the dimension of the structures 
of the model increases. This allows a clear distinction between the filament and 
wake models. All topological defect toy models considered here are more strongly 
clustered than the CDM model and hence have longer tails of the CIC statistic. 
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Wc can also consider two dimensional CIC statistics. They are constructed 
such that a comparison with data from the CFA2-'^) redshift survey is possible. 
Slices of data designed to resemble the CFA2 slices were extracted from cubes of 
simulated data by generating random orientations for the slices and selecting all 
galaxies in the cube within the angular (120° x 6°) and radial (100h~^Mpc) bounds 
of the shoe. The shoes were divided into 35^ cells of equal volume. Less cells per 
side reduced the resolution and generated a lot of noise, whereas more cells per 
side caused a significant shortening of the CIC curves. Considering cells of equal 
area instead of volume would weight nearby and far away galaxies differently. This 
explains our choices. 

In order to calculate the CIC statistic for the CFA data we must correct for 
the apparent magnitude limitation of the data set. This was done in two steps. 
First, a volume-limited subsample of the data with radial extent 100/i~^Mpc was 
used. The volume thus selected includes most of the interesting structure in each 
of the CFA slices, but is small enough such that selection effects can be reliably 
corrected. 

Second, the number of galaxies in each cell was multiplied by a selection func- 
tion /(r), r being the distance of the cell from us, which corrects for the deficiency 
of galaxies. This function /(r) can be determined from the Schechter luminosity 
function^^) 

^{L)dL = ip*{^rexp{-L/L*)d{^), (8) 

where (p{L)dL is the number density of galaxies with luminosity in the interval 
[L, L + dL], and a, (/?* and L* are parameters determined from the data. The CFA 

data gives^^^ 

a — —1.1, 

ip* = OmOh^Mpc-^, (9) 
M* = -19.2, 
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where M* is the absolute magnitude corresponding to L*. 

The number density of galaxies which can be seen at a distance r given the 
apparent magnitude cutoff in the data is 



0(r) = j ^iL)dL, (10) 

L(r) 

where L(r) is the absolute magnitude which at distance r corresponds to the ap- 
parent magnitude cutoff. The selection function /(r) is 

/M = f^. (11) 

(f){r) 

where ro is a suitably chosen reference distance (20/i~^Mpc in our case). 

The results of our simulations are shown in Figs. 4 and 5. In Fig. 4 the results 
are compared to the average of two CFA2 shces. The error bars of the individual 
CFA2 data were determined from the uncertainty in the positions due to peculiar 
velocities. The statistical error bars of the numerical simulations are shown in Fig. 
5. 

The tendency of the two dimensional CIC curves is the same as for three dimen- 
sions: the texture curve has the longest tail, followed by the wake and filamentary 
models. All three defect models give rise to CIC curves with longer tails than 
the CDM model. However, the observational error bars are sufficiently large such 
that only the Poisson model is convincingly ruled out. A analysis shows that 
the string filament model fits the data best, significantly better than the CDM 
model^) . 

At this stage, however, it is premature to draw conclusions about the validity 
of the various models of structure formation. The toy models are too naive to allow 
any such conclusion. The main lesson is that both two and three dimensional CIC 
statistics are good ways to analyze large-scale structure data and confront theory 
with observations. 
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5. Discussion 

We have studied the apphcabihty of a discrete genus statistic and two and 
three dimensional counts in cell statistics to distinguish the predictions of different 
models of structure formation. Most theories predict a similar power spectrum 
of density perturbations, and hence a good statistic must be able to pick out the 
non-random phases which differentiate between the models. 

We conclude that our three statistics give large differences when applied to toy 
models of structure formation. Topological defect models give rise to long tails in 
counts in cell statistics, the tail length increasing as the dimension of the prominent 
structure decreases. The discrete genus statistic is very sensitive to the topology of 
large-scale structure and shows a large difference between the texture and cosmic 
string wake toy models. 

We have apphed the statistics to five toy models of structure formation: Pois- 
son, CDM, global texture, cosmic string wakes and cosmic string filaments. The 
models are constructed to capture the important topological and statistical proper- 
ties of the "real" models on scales larger than the horizon at teq. On smaller scales, 
the topological defect toy models are too rough to give a good approximation to 
the actual models. 

In this paper we have only compared one set of data, namely two slices of the 
CFA2 redshfit survey, with the toy models. In future work we plan to analyze more 
data. We also plan to construct more realistic toy models for topological defect 
models which have the correct power spectrum on all scales and are normalized to 
agree with the CMB anisotropies measured on large angular scales by the COBE- 
DMR experiment^^-* . It will then be realistic to perform a detailed statistical 
comparison between toy models and observations. 
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Figure Captions 



Figure 1: The discrete genus statistic evaluated for the four models of 
structure formation considered in the text, and compared to the results for 
a Poisson distribution of galaxies. 

Figure 2: 3-d counts in cell statistic evaluated for the four toy models and 
for a Poisson distribution of galaxies. 

Figure 3: 3-d counts in cell statistic for the four toy models. One sigma 
statistical error bars are shown for the wake, filament and texture models. 

Figure 4: 2-d counts in cell for the filament, texture, Poisson and inflation- 
based CDM models, compared to the mean f{n) for two CFA shces. 

Figure 5: 2-d counts in cell (including statistical error bars) for wake, fila- 
ment, texture and CDM models. 
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